-
-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM layer support #8
Comments
Totally agree. Are you interested in implementing it? :-) edit: RNNs have also been asked for here. |
I was about to write about the same and then found this issue thread. I think it will be a great addition. If you agree I can try my luck. DId not do C++ in a while but would be great to go back. |
@rcshubhadeep That would be awesome! Once the algorithm works, we can adjust the C++ style afterwards. So if you'd like to start implementing, here are the steps that are needed:
When you have all this, your new layer type will automatically be tested (i.e. its results compared with the Keras implementation) in the unit tests. You can run the tests locally as described in If any questions arise, just let me know. :) |
Here to say, I am sorry but a newborn kid and the pressure of my present job presently preventing me from taking up this venture. Would have loved to |
No problem at all. Having kids is one of the few things that is even more awesome than implementing deep-learning layers. 😉 So I totally understand. |
@Dobiasd I briefly looked into implementing SimpleRNN recurrent layer, weight layout of model seems simple and exporting them wouldn't be hard. I did the forward pass within python using numpy and it agrees with the Keras predictions. Next steps would be to follow your outline above to have a functional layer. I am not clear yet how to handle the extra time dimension in the data though... Regarding recurrent layers in general, input shape has another dimension for time ie. |
@chammika Wow, good work on implementing it in Python already! Yes, it seems like we need a Do stateful RNN layers need a fifth dimension ( |
If we are to add
When Note that First SimpleRNN return a sequence which can be fed to a RNN where as second one does not/ it cannot if we are to connect a Dense layer next. I could replicate the Kras predictions in numpy as before.
There is no need Only the above cases will have
I am not clear about this yet. According to above example it doesn't. Changing the https://fairyonice.github.io/Understand-Keras's-RNN-behind-the-scenes-with-a-sin-wave-example.html Enlighten me if you figure this out 😃 |
Yeah, but if one layer type takes I guess we need some kind of unification, or do you see a loophole somewhere? 🙂
😕 |
Leaving the stateful question aside, perhaps there is no problem with "tensor4". Currently |
Yes that would be the best solution keep the current layer api intact. In that case https://stackoverflow.com/questions/42763928/how-to-use-model-reset-states-in-keras
If we flatten batches into one sequence we can have the same effect as having the stateful predictions. In that case we might want to update the api to
Pass the appropriate |
Sounds good. 👍 One question though: You mean changing the interface of Since frugally-deep does not support any kind of batching (not counting parallel predictions) I'd like to keep the word batch out. Do you think it is possible to leave the interface as it, and just assume (or calculate) the batch size if needed inside the LSTM-layer implementation? |
It's possible to keep the interface as it is for |
Sounds like a very good plan. 👍 Let me know if you have any questions regarding the integrations or if I can be of any other kind of help. 🙂 |
@Dobiasd First of all: keep up the good work, I think this project is really useful! I recently got my first job at an audio software company and it's my task to implement the I took a shot at implementing the For testing, I manually copied the weights from the keras model and fed them into the function. Output values are reasonably close to keras output. Actual implementation: https://gist.github.com/n-Guard/50a64f4ab837b06777263758b15e6118 Next thing would obviously be implementing the model conversion to be able to actually test random cases. I'm looking forward to get some feedback! |
@n-Guard Hi, and thanks for the nice feedback and this good contribution. Code cleanup regarding C++ style can be done later. :) The more important thing from my POV is that we now seem to have two people working on the same feature. Maybe to further improve quality (and to not waste your time too) a collaboration of you two would be reasonable? @chammika / @chammika-become What do you think? |
@n-Guard Great work! 👍 Could you also post a link to the python model that comes up with the same predictions for easy of comparison. I particularly want to see how the |
OK, cool. @n-Guard now that this part is clear, feel free to open a PR with your work if you like, so we can view it and so that you have automatic tests in the CI etc. |
@chammika-become Yes, it is indeed better to keep @Dobiasd Yes, I will soon open a PR :) |
I did not yet look into it. The main reason might be historical. For quite a long time |
Ah, so if I understand it correctly you would have a vector with elements representing multiple |
Yes, exactly. And to me this sounds more intuitive. On the other hand, I don't know much about LSTMs. So if this poses a problem, feel free to use an alternative approach. |
After doing some research, I don't think there is a problem specific to LSTMs since keras LSTM-layer (and RNN-layers in general) only accept a 3D Tensor as input ( |
If I understand correctly we basically can choose which one of our four dimensions ( So we might take the option that translates the simplest from what Keras does, i.e., the way that avoids conversions ( Naturally this should then automatically be what would be most intuitive for our users to create when calling |
Ok, I think we can definitely agree on having But: Since the |
Good catch! Is it possible for us at all to "just" do it like Keras does, or is this in conflict with how we handle tensors in general? (It might be that I don't understand the problem fully, so please correct me if needed. 🙂) |
Right. Up to now we only forward the raw tensors. It probably would result in some modifications in The basic idea of these functions is to not push the data trough the model from front to end, but instead pull it out from the end. This "pull" then propagates through the computational graph up to the input layer(s). One advantage of this is the following. Consider we have such a graph (
Pushing from |
Moved our conversation to a new issue. :) |
Hi guys, For the project I'm working on I need stateful LSTM (and a way to reset the internal cell state). Is this implemented in any of your functions? Unfortunately, my knowledge of C++ is limited and I don't feel I can help with it. |
@marco-monforte So you are looking for something like this in frugally-deep? const auto result_1 = model.predict(some_input_tensors);
model.reset_some_hidden_LSTM_state();
const auto result_2 = model.predict(some_other:input_tensors); |
@Dobiasd yes! And if I input two consecutive tensors, the hidden LSTM state is preserved from the first call to the second |
Currently something like that is not supported. |
No, it's a different concept from the When we run a prediction, the LSTM network output is built upon an internal state of the cells, obtained from the gates. What the model outputs, however, it's just the first information and the state is resetted after each prediction. In some cases, it's useful to keep this cell state intact so that the next prediction will depend also from the previous one, but then we have to manually control this resetting of the memory. It is something used in particular cases, also because of the "danger" associated to this memory preservation. Thanks anyway, the library is really great! Hopefully someone will add this feature soon :) |
@Dobiasd - what @marco-monforte is asking for is very useful. A common case is that an RNN is trained with input: (batch_size, sequence_length, features_dim) and stateful=False, return_sequence=True. Then you can convert the trained model to another "streaming" keras model that has input: (1,1, features_dim) and stateful=True, return_sequence=False. This streaming model then runs with an indefinite sequence length, remembering its past. If you want to start over, you reset the states explicitly. Since for many cases, the state is the previous output (eg, simpleRNN, GRU), perhaps a solution could be passing the input and the state (last input)? Sorry, I also am not a C++ expert... |
Thanks for the explanation. Could you give a minimal Keras code example that does it? |
Below is an example. The feature you added to the LSTM layer to set the state could be use to make this work. The state would have to be returned and then sent back as an input for the next step. It would be cleaner if the recursive layers had a "stateful" (remember state) and reset_state() options. Thanks for frugally-deep! import h5py
import keras
import numpy as np
from keras.layers import Input, Dense, GRU
##### generate toy data
train_seq_length = 4
feature_dim = 2
num_seqs = 8
x = np.random.randint(0, high=2, size = (num_seqs * train_seq_length, feature_dim) )
x = np.sign( x - 0.5 )
y = np.sum( ( x == np.roll(x, 1, axis = 0) ), axis = 1 )
### y[n] = number of agreements between x[n], x[n-1]
x = x.reshape( (num_seqs, train_seq_length, feature_dim) )
y = y.reshape( (num_seqs, train_seq_length, 1) )
###### Define/Build/Train Training Model
training_in_shape = x.shape[1:]
training_in = Input(shape=training_in_shape)
# training_in = Input(batch_shape=(None,train_seq_length,feature_dim)) this works too
foo = GRU(4, return_sequences=True, stateful=False)(training_in)
training_pred = Dense(1)(foo)
training_model = keras.Model(inputs=training_in, outputs=training_pred)
training_model.compile(loss='mean_squared_error', optimizer='adam')
training_model.summary()
training_model.fit(x, y, batch_size=2, epochs=10)
##### define the streaming-infernece model
streaming_in = Input(batch_shape=(1,1,feature_dim)) ## stateful ==> needs batch_shape specified
foo = GRU(4, return_sequences=False, stateful=True )(streaming_in)
streaming_pred = Dense(1)(foo)
streaming_model = keras.Model(inputs=streaming_in, outputs=streaming_pred)
streaming_model.compile(loss='mean_squared_error', optimizer='adam')
streaming_model.summary()
##### copy the weights from trained model to streaming-inference model
training_model.save_weights('weights.hd5', overwrite=True)
streaming_model.load_weights('weights.hd5')
##### demo the behaivor
print('\n\n******the streaming-inference model can replicate the sequence-based trained model:\n')
for s in range(num_seqs):
print(f'\n\nRunning Sequence {s} with STATE RESET:\n')
in_seq = x[s].reshape( (1, train_seq_length, feature_dim) )
seq_pred = training_model.predict(in_seq)
seq_pred = seq_pred.reshape(train_seq_length)
for n in range(train_seq_length):
in_feature_vector = x[s][n].reshape(1,1,feature_dim)
single_pred = streaming_model.predict(in_feature_vector)[0][0]
print(f'Seq-model Prediction, Streaming-Model Prediction, difference [{n}]: {seq_pred[n] : 3.2f}, {single_pred : 3.2f}, {seq_pred[n] - single_pred: 3.2f}')
streaming_model.reset_states()
print('\n\n******streaming-inference state needs reset between sequences to replicate sequence-based trained model:\n')
for s in range(num_seqs):
print(f'\n\nRunning Sequence {s} with NO STATE RESET:\n')
in_seq = x[s].reshape( (1, train_seq_length, feature_dim) )
seq_pred = training_model.predict(in_seq)
seq_pred = seq_pred.reshape(train_seq_length)
for n in range(train_seq_length):
in_feature_vector = x[s][n].reshape(1,1,feature_dim)
single_pred = streaming_model.predict(in_feature_vector)[0][0]
print(f'Seq-model Prediction, Streaming-Model Prediction, difference [{n}]: {seq_pred[n] : 3.2f}, {single_pred : 3.2f}, {seq_pred[n] - single_pred: 3.2f}')
#### NO STATE RESET HERE: streaming model will treat multiples sequences as one long sequence,
#### so after first sequence, the streaming output will differ, difference will decay with time from start up as effect of intial state fades
for s in range(2):
N = np.random.randint(1, 10)
print(f'\n\n******streaming-inference can work on an sequences of indefinite length -- running length {N}:\n')
for n in range(N):
x_sample = np.random.randint(0, high=2, size = ( 1, 1, feature_dim) )
x_sample = np.sign( x_sample - 0.5 )
single_pred = streaming_model.predict(x_sample)[0][0]
print(f'Streaming-Model Prediction[{n}]: {single_pred : 3.2f}')
streaming_model.reset_states() |
Thanks a lot. I'm still in the process of trying to understand it, but you seem to already do. Also, you write good code. Would you be interested in trying to implement it in frugally-deep in a PR? I'd prefer the cleaner solution, you proposed. If I understand correctly, we would need to give up the |
Regarding the code sample, the main point is that the training_model is trained using sequences and w/o statefulness, so each sequence is separate training sample and starts from the zero-state. The streaming model is the same as the training model, except: (i) the input shape is just one time sample (sequence length 1) and (ii) it is stateful, meaning that each call picks up from where the last left off. You need to use sequences of a fixed length to train, but you may want to run the trained model on a time series of indefinite length and the streaming model does that. I am happy to help with adding this feature to the recurrent layers, but I am just starting to familiarize myself with frugally-deep and am a C++ novice -- I don't really get all of the headerless and lamba stuff. Despite you nice comment, I am not much of a programmer. BTW, if you have any good resources to get up to speed on the C++ approaches used in fd, please share (I am starting from ansi C and some C++). I understand the recurrent layer functionality and math pretty well though, so can help on that front... If the recurrent layer was a class in C++, then the feature could be implemented with (i) a private vector that is the state (in LSTMs, this is represented as two state vectors typically) (ii) a public function that would allow you to set/reset the state, and (iii) another private variable defining if the layer is stateful (ie., if the state is reset to 0 at each call or if it retains the previous state). I think what you are saying is that your current implementation does not allow for state in model::predict (for thread-safe reasons?) and that you could accomplish the vanilla-C++ approach above using the thread_local storage method. I don't really understand any of that.. ;-) -- but yeah, if that accomplishes the same thing with the benefits you mention, great! BTW, in keras, the reset_states() for a model resets the states of all recurrent layers and the state of a given layer can be set to a specific value using: my_model.layers[i].reset_states() # sets to zero |
Ah, OK, understood.
Yes, it is a class in C++. Currently, the model class does not provide public access to the layers. Also, it only stores them as base-class ( One way, that allows us to provide a supple way for the user of Would
Yeah, basically the approach with the state variables in the LSTM-layer class would become threadsafe when we simply declare them as Once we are clear about the questions from above, maybe we could share the work like the following:
|
Thanks. I am ok to write the tests in I am starting to understand your code a little bit better -- thanks for the pointers! I don't think that just having a reset_states() functionality is what we need. Essentially the default (stateful=False) behavior in keras and your code is to reset_state() at the start of each call to model::predict. What we need is a method to remember_states between model.predict() calls. Having a reset_states() function in addition to that would be good too. My understanding is that it is not simple to add a private variable to the LSTM/GRU layers that would capture the state because the model::predict is const, meaning that it cannot set any variables in the classes/subclasses. However, following your lead, can we make the entire model have: (i) a stateful bool variable that would apply to all layers and (ii) a reset_states() that would apply to all layers? Having granularity to the states of each layer individually would be good, but it is not needed to cover the case that started this conversation and I think it is not needed for 99% of applications. I see that your lstm_impl() function takes in initial states (for h, c) and has a return_state option. This should be enough to implement the desired functionality if state variables can be held in the model class. We should add consistent functionality to all recurrent layers (looks like you just have GRU and LSTM, but we could add simpleRNN). Below is pseudo code for what I mean. I am not clear on how the state and sequence are stored in the return tensor for LSTM when return_state is true, so this is not precise. The basic idea is to pass and return the state for each layer when either the model is stateful or if the layer has return_state = True, but pass the state to the next layer (or output) only if return_state = True. In this approach, there are state variables in the model class, but not in the layer classes. Not sure that this accomplished a path to thread-safe implementation because the new predict function needs to update the state variable of model...
Below is some more detail about what is going on with the "stateful" setting in keras: Think of a RNN (LSTM, GRU, etc) as black box that takes in one input time sample at time n: x[n]. The RNN is in state s[n] at time n. Then, the output at time n is y[n] and it is a function of both the state s[n] and input x[n]. The next state is also a function of these two variables, so one time-step of the RNN is: y[n] = next_output(s[n], x[n]) In this sense, RNNs are always "stateful". However, the way keras and your GRU/LSTM code work, they take in a sequence of inputs {x[n]} for n=0...N-1. During one call with this input sequence, the steps are run by the above equations, but the initial state is set to 0. In keras this is stateful=False behavior. Stateful=True behavior in keras is that the initial state is remembered from the last call. Suppose, you have a x[0]...x[199] and your RNN takes in length 100 sequences. If you make two calls to RNN.predict, first using x[0:100] and then x[100:200], then if keras stateful=True, the second call has initial state set to s[100] -- i.e., the final state from the first call. If stateful=False, then the second call starts with zero state value. The use case I highlighted is just using a sequence length of 1 with stateful=True, which is useful in practice. So, your current implementation effectively runs a "reset_states()" at the start of each call. |
Yeah, I understand that. But that's an implementation detail of an individual layer class. The architecture is not converned about that, except:
Yes, we would drop the
I don't yet see why we need this
Cool, having only
If
We just make this stateful private variables
Yes, I understand now. :) Maybe I'm missing something, but right now I don't see an actual problem/blocker here. I think we could simply start implementing it. Suggestion: I open a new branch called |
Yes, this is the simplest. Just keep the states a private state variable in each (recurrent) layer and then if stateful=False, call reset_states(). reset_states() can also be available at the model level. This is perfect and the most simple and direct way to do it. I just did not understand how big of a deal it would be to have state in the layers.
great!
This sounds good. The changes should be small in the layers -- we can do LSTM and GRU. |
The same as Keras does. If an LSTM layer is invoked with only one input tensor, that means no initial states. If there are 3 input tensors, then the last two ones represent the initial state. This happens here in the code.
I thought "99% of applications" don't require that, and simply set to all zeros is enough. Or did I misunderstand that? Maybe we just start with that simple approach and then see how it goes. I suggest the following game plan. Tobias:
Keith:
|
OK, done. I just created the new branch and pushed a commit with the architectural skeleton. Let me know if you feel blocked at some point or something is missing, etc. 🙂 |
OK. Thanks. I will start on it... |
Yes, I don't have a use to set the states to a specific value, I just though that this functionality and reset_states() are so similar, you may want to combine. The plan you posted is good. I may be a little slow on the uptake, but I will work on it and let you know... |
Similar, but, maybe surprisingly, way more complex. Having something like
Currently, it's not clear to me how we could come up with meaningful indexing. We might need to expose the whole model architecture (graph + nesting) to the public API so that all layers can be traversed or found by some other means in order to set the state of a particular layer. If possible, I'd like to avoid increasing the "surface" of the library that drastically. |
Thanks to the awesome work of @keithchugg, stateful models are now fully supported with the latest release, i.e. |
Hey guys! I've been finally able to test the stateful LSTM in my C++ code and it works amazingly! Thank you very much! Very appreciated 😃 |
Hey @Dobiasd ! I'm sorry to come back again on this issue, but I'm working on my old code with LSTMs after long time and I'm not able to adjust these two lines of code to the new tensors definitions:
Could you help me? |
I'm afk right now. What error message do you get? |
I'm getting that 'tensor5' in the first line is not a member of fdeep. The issues apparently are there, in the first line. I don't know if I should use 'tensor' or 'tensors' as input, and especially how to substitute now shape5, which doesn't exists anymore in the library. Removing the * doesn't help |
The following two parts of the FAQ (together with the usage example in the readme) cover your use-case: |
Thanks! Apparently, I've been able to compile by doing the following:
but then I get this error at runtime:
I don't get the meaning of [2(Nothing, Just 3)], while the second pair of brackets should be right, given my t tensor. |
Since your problem is not related to LSTM layers, I guess this is not the right place to discuss this. Also, since it might spam the other participants of this thread with notifications, that are of no interest to them. Basically, you're not providing the right input shape for your model. Try to only provide sizes for the dimensions, that are actually used, i.e., the last two. In case of further problems, please open a separate issue. |
It would be very nice to also support LSTM layers
The text was updated successfully, but these errors were encountered: