-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
python/mxnet/rnn/rnn_cell.py
Outdated
def __call__(self, inputs, states, params, prefix=''): | ||
W = params.get('%si2h_weight'%prefix) | ||
B = params.get('%si2h_bias'%prefix) | ||
U = params.get('%sh2h_weight'%prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to put the params + prefix to the __init__
of the RNNCell?
So we do not need to manually set the prefix when calling the one-step RNN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Because when you put cells in to stackedcell it needs to use different prefix for each stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Previously I expect the usage of the RNNCell to be like the following:
rnn1 = RNNCell(num_hidden=10, activation="tanh", name="rnn1")
state = rnn1.begin_state()
for i in range(10):
out, state = rnn1(inputs=dat[i], states=state)
We first define a RNNCell and then repeatedly apply it to make sure that the parameters are shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works too. Though we need to be clear that cells should not be copied to form stacks.
I did this because tf did this.
I don't have strong feelings one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support variable scope like TF? I find that they manage the weights/biases using the scope name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think prefix + param is good enough. It aligns with mxnet name matching better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we do not need both the param and prefix to determine the weights and biases. We can get the names purely by the prefix. If we keep the current way, we will need to create the RNNParam every time we use RNN. Could we design it as an inner registry?
Nevertheless, I think that it's acceptable to keep an additional parameter.
python/mxnet/rnn/rnn_cell.py
Outdated
def output_shape(self): | ||
return (0, self._num_hidden) | ||
|
||
def __call__(self, inputs, states, params, prefix=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest default prefix be "rnn"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good start. I'd like to see how the cuDNN symbol gets used in this framework, and also what the application-level code looks like for problems like sequence classification, or sequence-to-sequence, all with and without attention. And then maybe we build utility classes for constructing this kind of network as well. But this level of interface probably needs to underly those application-level modules.
python/mxnet/rnn/rnn_cell.py
Outdated
from .. import ndarray | ||
from ..base import numeric_types, string_types | ||
|
||
class RNNParams(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest comment
"""This class holds the learnable parameters (weights, biases) for the RNN.
New params are added as needed with get()
"""
python/mxnet/rnn/rnn_cell.py
Outdated
def begin_state(self, prefix='', init_sym=symbol.zeros, **kwargs): | ||
"""initial state""" | ||
state_shape = self.state_shape | ||
def recursive(shape, c): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is doing. Can you add a comment -- It looks like the state_shape property is overloaded to support multiple different types. We should document what the options are and why.
python/mxnet/rnn/rnn_cell.py
Outdated
self._counter = 0 | ||
|
||
@property | ||
def state_shape(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested comment:
"""LSTM has two internal states, typically called "cell" and "hidden". Here we're setting them both to the same size."""
python/mxnet/rnn/rnn_cell.py
Outdated
|
||
|
||
class StackedRNNCell(BaseRNNCell): | ||
"""Stacked multple rnn cels""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multple -> multiple
This takes a single RNN cell and returns a simple stack of it? No. This takes a list of RNN cells and puts them together to make a stack by wiring the outputs to the inputs. We should say that.
Also, that seems like a bunch of work for somebody to have to do to use a stacked LSTM. They should just be able to do this more easily, like with a single constructor. This class seems more like an "RNNCellStacker" than a "StackedRNNCell".
@@ -48,6 +48,10 @@ def __repr__(self): | |||
return '<%s %s>' % (self.__class__.__name__, | |||
'Grouped' if name is None else name) | |||
|
|||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty fundamental change. Maybe this should be a named method like "outputs"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bug fix. Previously for x in symbol
will fail
src/operator/operator_common.h
Outdated
@@ -67,26 +67,72 @@ struct InferTypeError { | |||
: msg(msg), index(index) {} | |||
}; | |||
|
|||
/*! \brief check if shape is empty or contains unkown (0) dim. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unkown -> unknown
From my point of view, one way to support CuDNN optimization is to enable multi-step forwarding (e.g add an argument in https://github.com/ML-HK/mxnet/blob/master/python/mxnet/recurrent.py#L90-L120 |
Cudnn will be used as a function fused_rnn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor comments.
|
||
|
||
def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0): | ||
lines = open(fname).readlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with open(fname, 'r') as f:
lines = f.readlines()
# pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme | ||
# pylint: disable=superfluous-parens, no-member, invalid-name | ||
import sys | ||
sys.path.insert(0, "../../python") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
python/mxnet/rnn/io.py
Outdated
provide_data=[(self.data_name, data.shape)], | ||
provide_label=[(self.label_name, label.shape)]) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up empty lines?
Overall, it looks great. Several comments:
|
"""shape(s) of output""" | ||
return (0, self._num_hidden) | ||
|
||
def __call__(self, inputs, states): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piiswrong @pluskid Should we also add a "seq_len" or "input_length" argument to this __call__ function? Like the "input_length" in Keras https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L115-L123.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do it will rnn_unrll
|
|
|
The PR looks good to me. |
I think it is better to have a unified RNN interface for cuDNN and non-cuDNN. But if that is making things too complicated, this looks good to me, too. |
@pluskid That's very hard given cudnn rnn packs weight. |
I agree with Eric for the cuDNN part. It's really difficult to handle the weights when there are more than one layers. Also, is it possible to detect RNN-like patterns in the constructed symbol and use cuDNN-RNN as a kind of kernel fusion? |
|
||
ndiscard = 0 | ||
self.data = [[] for _ in buckets] | ||
for i in xrange(len(sentences)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this makes the otherwise python3 compatible code incompatible to python 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change xrange to range in python3.
@leopd @sxjscience