[WIP] Tentative RNN Interface #4618

piiswrong · 2017-01-10T00:14:51Z

sxjscience · 2017-01-10T03:17:22Z

python/mxnet/rnn/rnn_cell.py

+    def __call__(self, inputs, states, params, prefix=''):
+        W = params.get('%si2h_weight'%prefix)
+        B = params.get('%si2h_bias'%prefix)
+        U = params.get('%sh2h_weight'%prefix)


Would it be better to put the params + prefix to the __init__ of the RNNCell?
So we do not need to manually set the prefix when calling the one-step RNN

No. Because when you put cells in to stackedcell it needs to use different prefix for each stack

OK. Previously I expect the usage of the RNNCell to be like the following:

rnn1 = RNNCell(num_hidden=10, activation="tanh", name="rnn1") state = rnn1.begin_state() for i in range(10): out, state = rnn1(inputs=dat[i], states=state)

We first define a RNNCell and then repeatedly apply it to make sure that the parameters are shared.

That works too. Though we need to be clear that cells should not be copied to form stacks.
I did this because tf did this.
I don't have strong feelings one way or another.

Should we support variable scope like TF? I find that they manage the weights/biases using the scope name.

I think prefix + param is good enough. It aligns with mxnet name matching better

I feel that we do not need both the param and prefix to determine the weights and biases. We can get the names purely by the prefix. If we keep the current way, we will need to create the RNNParam every time we use RNN. Could we design it as an inner registry?

Nevertheless, I think that it's acceptable to keep an additional parameter.

leopd · 2017-01-10T20:28:37Z

python/mxnet/rnn/rnn_cell.py

+    def output_shape(self):
+        return (0, self._num_hidden)
+
+    def __call__(self, inputs, states, params, prefix=''):


Suggest default prefix be "rnn"

leopd

Seems like a good start. I'd like to see how the cuDNN symbol gets used in this framework, and also what the application-level code looks like for problems like sequence classification, or sequence-to-sequence, all with and without attention. And then maybe we build utility classes for constructing this kind of network as well. But this level of interface probably needs to underly those application-level modules.

leopd · 2017-01-10T20:32:48Z

python/mxnet/rnn/rnn_cell.py

+from .. import ndarray
+from ..base import numeric_types, string_types
+
+class RNNParams(object):


Suggest comment
"""This class holds the learnable parameters (weights, biases) for the RNN.
New params are added as needed with get()
"""

leopd · 2017-01-10T20:34:49Z

python/mxnet/rnn/rnn_cell.py

+    def begin_state(self, prefix='', init_sym=symbol.zeros, **kwargs):
+        """initial state"""
+        state_shape = self.state_shape
+        def recursive(shape, c):


I don't understand what this is doing. Can you add a comment -- It looks like the state_shape property is overloaded to support multiple different types. We should document what the options are and why.

leopd · 2017-01-10T20:39:54Z

python/mxnet/rnn/rnn_cell.py

+        self._counter = 0
+
+    @property
+    def state_shape(self):


Suggested comment:
"""LSTM has two internal states, typically called "cell" and "hidden". Here we're setting them both to the same size."""

leopd · 2017-01-10T20:42:56Z

python/mxnet/rnn/rnn_cell.py

+
+
+class StackedRNNCell(BaseRNNCell):
+    """Stacked multple rnn cels"""


multple -> multiple

This takes a single RNN cell and returns a simple stack of it? No. This takes a list of RNN cells and puts them together to make a stack by wiring the outputs to the inputs. We should say that.

Also, that seems like a bunch of work for somebody to have to do to use a stacked LSTM. They should just be able to do this more easily, like with a single constructor. This class seems more like an "RNNCellStacker" than a "StackedRNNCell".

leopd · 2017-01-10T20:44:03Z

python/mxnet/symbol.py

@@ -48,6 +48,10 @@ def __repr__(self):
        return '<%s %s>' % (self.__class__.__name__,
                            'Grouped' if name is None else name)

+    def __iter__(self):


This is a pretty fundamental change. Maybe this should be a named method like "outputs"?

It's a bug fix. Previously for x in symbol will fail

leopd · 2017-01-10T20:44:33Z

src/operator/operator_common.h

@@ -67,26 +67,72 @@ struct InferTypeError {
    : msg(msg), index(index) {}
 };

+/*! \brief check if shape is empty or contains unkown (0) dim. */


unkown -> unknown

sxjscience · 2017-01-11T04:57:11Z

From my point of view, one way to support CuDNN optimization is to enable multi-step forwarding (e.g add an argument in __call__ called step_num or seq_len) in the RNNCell and add a flag in __init__ to determine whether to use CuDNN . However, we will also need to add another operator to convert the biases and weights to the "parameter" in mx.sym.RNN.

https://github.com/ML-HK/mxnet/blob/master/python/mxnet/recurrent.py#L90-L120

piiswrong · 2017-01-11T07:36:22Z

Cudnn will be used as a function fused_rnn

zarandioon

few minor comments.

zarandioon · 2017-01-26T01:40:48Z

example/rnn/lstm_bucketing.py

+
+
+def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
+    lines = open(fname).readlines()


with open(fname, 'r') as f: lines = f.readlines()

zarandioon · 2017-01-26T02:02:19Z

example/rnn/old/lstm_bucketing.py

+# pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme
+# pylint: disable=superfluous-parens, no-member, invalid-name
+import sys
+sys.path.insert(0, "../../python")


is this needed?

pluskid · 2017-01-26T02:34:39Z

python/mxnet/rnn/io.py

+                         provide_data=[(self.data_name, data.shape)],
+                         provide_label=[(self.label_name, label.shape)])
+
+


clean up empty lines?

pluskid · 2017-01-26T03:00:41Z

Overall, it looks great. Several comments:

The handling of init states is much better than the previous ad hoc way. However, I guess the user needs to re-construct the symbol, instead of just loading from json, in order to do inference, because the init states will be feeded inputs, instead of mx.zeros. Is there some way to make this easier?
The current interface handle time steps as a list, by using SliceChannel to split and later on re-merge. I'm wondering if there is a performance issue here (e.g. for applications with ~2k sequence length). Would it be better to use a fully packed tensor, and making one of the dimension as the time? (i.e. time-major / batch-major data layout). The benefit is also that cuDNN cell uses this layout, so it might be easier to wrap. Another benefit is that symbolically, an RNN with seq-len 10 and seq-len 20 will be exactly the same, except binded with different input shapes.
Currently there seem to be simple one layer RNN cell and then sequential RNN cell that could contain a stack of RNN cells. I'm wondering do we need to expose many different concepts to the user. With commonly used wrappers, for both the tensorflow interface and the theano interface, the RNN cell is a single function that could be used to create multiple layers with a single call. Plus, it can be mixed with other operators, without needing to explicitly call unroll. Our own cuDNN RNN Cell demo is an example that I think is kind of keeping the syntax as close as calling other mxnet operator.

sxjscience · 2017-01-26T04:30:38Z

python/mxnet/rnn/rnn_cell.py

+        """shape(s) of output"""
+        return (0, self._num_hidden)
+
+    def __call__(self, inputs, states):


@piiswrong @pluskid Should we also add a "seq_len" or "input_length" argument to this __call__ function? Like the "input_length" in Keras https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L115-L123.

you can do it will rnn_unrll

piiswrong · 2017-01-26T17:26:18Z

Why would the init state be non zero for inference? @pluskid
The current interface allows more flexibility. You can easily unroll it with rnn_unroll. Cudnn rnn will be called FusedRNN. I haven't decided if it should be a Cell or a function. Function is probably more natural in this case.
This can be a higher level interface.

mz24cn · 2017-01-28T13:43:31Z

make a link: https://github.com/dmlc/MXNet.cpp/blob/master/example/charRNN.cpp

pluskid · 2017-01-28T14:29:10Z

@piiswrong

I'm talking about step-by-step inference, where one need to do sampling for each time step, and feed that as input to the next time step. The state needs to be forwarded, too. Another situation is to implement truncated-bptt, where the backward is truncated for a fixed number of steps, but forward state is kept. See our speech demo for an existing implementation of bptt.
So the cuDNN backend will be wrapped in a different operator?

piiswrong · 2017-01-28T19:24:36Z

@pluskid

I see , you can do that by calling begin_state(init_sym=sym.Variable), I'll document this.
Yes.

sxjscience · 2017-01-30T12:01:30Z

The PR looks good to me.

pluskid · 2017-01-30T15:15:49Z

I think it is better to have a unified RNN interface for cuDNN and non-cuDNN. But if that is making things too complicated, this looks good to me, too.

piiswrong · 2017-01-30T16:54:24Z

@pluskid That's very hard given cudnn rnn packs weight.

sxjscience · 2017-01-30T17:08:00Z

I agree with Eric for the cuDNN part. It's really difficult to handle the weights when there are more than one layers. Also, is it possible to detect RNN-like patterns in the constructed symbol and use cuDNN-RNN as a kind of kernel fusion?

tdomhan · 2017-02-10T14:20:20Z

python/mxnet/rnn/io.py

+
+        ndiscard = 0
+        self.data = [[] for _ in buckets]
+        for i in xrange(len(sentences)):


note: this makes the otherwise python3 compatible code incompatible to python 3.

change xrange to range in python3.

sxjscience reviewed Jan 10, 2017

View reviewed changes

leopd reviewed Jan 10, 2017

View reviewed changes

piiswrong force-pushed the rnn branch from bf3bfef to 1a5db19 Compare January 25, 2017 23:27

zarandioon reviewed Jan 26, 2017

View reviewed changes

pluskid reviewed Jan 26, 2017

View reviewed changes

sxjscience reviewed Jan 26, 2017

View reviewed changes

piiswrong force-pushed the rnn branch from 6d7aa44 to 303a5ca Compare February 3, 2017 01:05

piiswrong mentioned this pull request Feb 4, 2017

Compare the performace with other tools on LSTM task #4878

Closed

piiswrong added 9 commits February 4, 2017 06:52

Support incomplete shape in infershape

82e126d

rnn

6a19501

rnn fix

59d22d2

refactor interface

999aa3b

refactor RNN example

a2cecb0

fix

34d0be1

fix

e0070af

fix

343b593

fix

e6aa1d6

piiswrong force-pushed the rnn branch from 303a5ca to e6aa1d6 Compare February 4, 2017 06:52

piiswrong merged commit 8b9d909 into apache:master Feb 4, 2017

pineking mentioned this pull request Feb 4, 2017

mxnet with mkldnn enabled hclhkbu/dlbench#5

Closed

tdomhan reviewed Feb 10, 2017

View reviewed changes

lazywei mentioned this pull request Feb 10, 2017

Make the range function compatible to both Python 2 and 3. #4975

Closed



		class StackedRNNCell(BaseRNNCell):
		"""Stacked multple rnn cels"""



		def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
		lines = open(fname).readlines()

		provide_data=[(self.data_name, data.shape)],
		provide_label=[(self.label_name, label.shape)])

[WIP] Tentative RNN Interface #4618

[WIP] Tentative RNN Interface #4618

Conversation

piiswrong commented Jan 10, 2017

sxjscience Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

piiswrong Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

sxjscience Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience Jan 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leopd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience commented Jan 11, 2017

piiswrong commented Jan 11, 2017

zarandioon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pluskid commented Jan 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Jan 26, 2017 • edited Loading

mz24cn commented Jan 28, 2017 • edited Loading

pluskid commented Jan 28, 2017

piiswrong commented Jan 28, 2017

sxjscience commented Jan 30, 2017

pluskid commented Jan 30, 2017

piiswrong commented Jan 30, 2017

sxjscience commented Jan 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience Jan 10, 2017 •

edited

Loading

piiswrong Jan 10, 2017 •

edited

Loading

sxjscience Jan 10, 2017 •

edited

Loading

sxjscience Jan 11, 2017 •

edited

Loading

piiswrong commented Jan 26, 2017 •

edited

Loading

mz24cn commented Jan 28, 2017 •

edited

Loading