-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in PickNegLogSoftmax #72
Comments
Actually, (batch_size * outputdim) is the wrong thing to do here. Each Tensor object has a special "batch" dimension, so it needs to be of size ([outputdim, 1], batch_size) where batch_size corresponds to this batch dimension. Creating lookup inputs in this shape is supported through lookup_batch, as in minibatch.py. If you actually want to create non-lookup input with minibatches, this is supported in the c++ version but perhaps not in the python version yet? Right, @yoavg ? |
Thanks! I did originally call lookup_batch, but then fed these vectors into LSTMs, concatenated their outputs and used that as the input of another hidden layer, with some reshapes along the way. So it's either a lack of pycnn support, or wrong usage on my side. Anyway, I'm currently using |
If that's the case, this shouldn't cause an error. If you can create a minimal example that reproduces the error I can take a look and try to debug. |
Here is an example code that produces an error: from pycnn import *
m = Model()
lp = m.add_lookup_parameters((100, 10))
# Add parameters
W = parameter(m.add_parameters((5, 10)))
b = parameter(m.add_parameters((5)))
builder = LSTMBuilder(1, 10, 10, m)
# Batch lookup (for every timestamp of the LSTM)
batch = [(1, 2, 3, 4, 5), (5, 4, 3, 2, 1), (1, 1, 2, 2, 3)]
minibatch_size = 3
seq_length = 5
inputs = [lookup_batch(lp, [batch[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]
# Apply some transformations
lstm_outputs = builder.initial_state().transduce(inputs)[-1]
lstm_outputs = transpose(reshape(lstm_outputs, (10, minibatch_size))) # reshape to (minibatch_size, 10)
output = colwise_add(W * transpose(lstm_outputs), b)
print 'output.npvalue().shape =', output.npvalue().shape, '(batch_size=3, output dimension=5)'
# Using pickneglogsoftmax_batch (fails with "what(): Bad input dimensions in PickNegLogSoftmax: [{5,3}]")
labels = [1, 2, 3]
print pickneglogsoftmax_batch(output, labels).npvalue() The output size seems to be wrong after the operations I applied, so Thanks! |
OK, I've reproduced this and will try to fix it soon. Thanks for the easy-to-reproduce example. |
Hmm, this is because of the reshape here. dynet treats the number of minibatches as a special dimension, so the reshape needs to not be a normal (10 x minibatchsize) dimensioned matrix, but a (10x1) matrix, with the special "minibatch" dimension be equal to minibatchsize. I'm not exactly sure how to do this in the python version though... |
How would you do it in the cpp version? |
Dim({10}, minibatchsize) |
is Dim({10}) and Dim({10}, 1) equivalent? (batch size 1?) On Wed, Oct 12, 2016 at 6:25 PM, Graham Neubig notifications@github.com
|
Yes. |
I pushed batch size support in reshape (and in Dim in general). from dynet import *
m = Model()
lp = m.add_lookup_parameters((100, 10))
# Add parameters
W = parameter(m.add_parameters((5, 10)))
b = parameter(m.add_parameters((5)))
builder = LSTMBuilder(1, 10, 10, m)
# Batch lookup (for every timestamp of the LSTM)
batch = [(1, 2, 3, 4, 5), (5, 4, 3, 2, 1), (1, 1, 2, 2, 3)]
minibatch_size = 3
seq_length = 5
inputs = [lookup_batch(lp, [batch[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]
# Apply some transformations
lstm_outputs = builder.initial_state().transduce(inputs)[-1]
lstm_outputs = transpose(reshape(lstm_outputs, (10,), batch_size=minibatch_size)) # reshape to (minibatch_size, 10)
output = colwise_add(W * transpose(lstm_outputs), b)
print 'output.npvalue().shape =', output.npvalue().shape, '(batch_size=3, output dimension=5)'
# Using pickneglogsoftmax_batch (fails with "what(): Bad input dimensions in PickNegLogSoftmax: [{5,3}]")
labels = [1, 2, 3]
print pickneglogsoftmax_batch(output, labels).npvalue() Still need to document it, though. (also, may still be buggy when converting to numpy array, this may need a bit more work) |
Thank you! The example code seems to work now. |
Sorry, I might have closed this bug too soon. Though there is no exception now, I can't seem to get the loss to decrease between iterations (which previously worked well with my workaround). I printed the value of I'm not sure if it's a bug in the conversion to numpy array (when calling import numpy as np
from dynet import *
m = Model()
lp = m.add_lookup_parameters((100, 10))
# Add parameters
W = parameter(m.add_parameters((5, 10)))
builder = LSTMBuilder(1, 10, 10, m)
trainer = AdamTrainer(m)
num_epochs = 10
minibatch_size = 3
seq_length = 5
train_set = [(1, 1, 1, 1, 1), (2, 2, 2, 2, 2), (1, 1, 1, 1, 1)]
labels = [1, 2, 1]
for epoch in range(num_epochs):
total_loss = 0.0
# Use the entire small training set as a batch, just for the example
batch_indices = np.random.permutation(len(train_set))
# Batch lookup (for every timestamp of the LSTM)
inputs = [lookup_batch(lp, [train_set[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]
# Apply some transformations
lstm_outputs = builder.initial_state().transduce(inputs)[-1]
lstm_outputs = reshape(lstm_outputs, (10,), batch_size=minibatch_size) # reshape to (minibatch_size, 10)
output = W * lstm_outputs
# Compute the loss for this batch
curr_labels = [labels[i] for i in batch_indices]
loss = pickneglogsoftmax_batch(output, curr_labels)
sum_loss = sum_batches(loss)
total_loss += sum_loss.value()
sum_loss.backward()
trainer.update()
trainer.update_epoch()
total_loss /= len(labels)
print 'Epoch', (epoch + 1), '/', num_epochs, 'Loss =', total_loss The output:
Thanks! |
I've confirmed that this works now, due to the fix of #99 |
I'm trying to use pickneglogsoftmax_batch (using gpycnn, v2) and get the following error:
python: /home/vered/cnn/cnn/nodes.cc:1380: void cnn::PickNegLogSoftmax::forward_dev_impl(const MyDevice&, const std::vector<const cnn::Tensor*>&, cnn::Tensor&) const [with MyDevice = cnn::Device_GPU]: Assertion `pvals->size() == fx.d.batch_elems()' failed.
With the following code:
where the shape of
batch_predictions
is (batch_size * output_dim, 1). Other shapes (i.e. a matrix of batch_size x output_dim or its transpose) return a dimension error.Thanks in advance.
The text was updated successfully, but these errors were encountered: