New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have a question about rnn code #8
Comments
Yes, BasicLSTMCell requires to provide parameters together, for computation efficiency. During calculation, they will be splitted. # Parameters of gates are concatenated into one multiply for efficiency.
c, h = tf.split(1, 2, state) Note: This is only specific to LSTM cell, for SimpleRNN or GRU cell, you doesn't need to provide 2xnum_units, just num_units For more info, you can have a look at BasicLSTMCell function defines in rnn_cell.py: class BasicLSTMCell(RNNCell):
"""Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/pdf/1409.2329v5.pdf.
It does not allow cell clipping, a projection layer, and does not
use peep-hole connections: it is the basic baseline.
Biases of the forget gate are initialized by default to 1 in order to reduce
the scale of forgetting in the beginning of the training.
"""
def __init__(self, num_units, forget_bias=1.0):
self._num_units = num_units
self._forget_bias = forget_bias
@property
def input_size(self):
return self._num_units
@property
def output_size(self):
return self._num_units
@property
def state_size(self):
return 2 * self._num_units
def __call__(self, inputs, state, scope=None):
"""Long short-term memory cell (LSTM)."""
with tf.variable_scope(scope or type(self).__name__): # "BasicLSTMCell"
# Parameters of gates are concatenated into one multiply for efficiency.
c, h = tf.split(1, 2, state)
concat = linear.linear([inputs, h], 4 * self._num_units, True)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = tf.split(1, 4, concat)
new_c = c * tf.sigmoid(f + self._forget_bias) + tf.sigmoid(i) * tf.tanh(j)
new_h = tf.tanh(new_c) * tf.sigmoid(o)
return new_h, tf.concat(1, [new_c, new_h]) |
I have one more question that: # input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden) from the example provided by google (see ptb_word_lm.py in tensorflow/models/rnn/ptb/) self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps]) and then with tf.device("/cpu:0"):
embedding = tf.get_variable("embedding", [vocab_size, size])
inputs = tf.nn.embedding_lookup(embedding, self._input_data) they did nothing but embedding can you tell me the difference? Your acknowledgement is appreciated. |
In this case, we are classifying MNIST dataset using a RNN, by considering every image pixel row as a sequence. So for every image, we will have 28 sequences with 28 steps. Here we do not need embedding, because we can already "compare" pixels together (every sequence element represents a pixel gray-scale intensity). But in tensorflow, they need embedding, because their sequence is a list of integers (word index in a dictionary), and you can't really "compare" them directly (they are just IDs). In practice, embedding can greatly improves performances in NLP.
Extracted from https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#lstm |
Thank you for your prompt reply. |
why i have to 2*n_hidden?
does it mean that we are using lstm cell and tensorflow.model.rnn require us to provide initial hidden state and initial cell state together when using lstm cell?
The text was updated successfully, but these errors were encountered: