Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in PickNegLogSoftmax #72

Closed
vered1986 opened this issue Sep 8, 2016 · 15 comments
Closed

Error in PickNegLogSoftmax #72

vered1986 opened this issue Sep 8, 2016 · 15 comments
Assignees
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds

Comments

@vered1986
Copy link
Contributor

I'm trying to use pickneglogsoftmax_batch (using gpycnn, v2) and get the following error:

python: /home/vered/cnn/cnn/nodes.cc:1380: void cnn::PickNegLogSoftmax::forward_dev_impl(const MyDevice&, const std::vector<const cnn::Tensor*>&, cnn::Tensor&) const [with MyDevice = cnn::Device_GPU]: Assertion `pvals->size() == fx.d.batch_elems()' failed.

With the following code:

loss = pickneglogsoftmax_batch(batch_predictions, batch_labels)

where the shape of batch_predictions is (batch_size * output_dim, 1). Other shapes (i.e. a matrix of batch_size x output_dim or its transpose) return a dimension error.

Thanks in advance.

@neubig
Copy link
Contributor

neubig commented Sep 9, 2016

Actually, (batch_size * outputdim) is the wrong thing to do here. Each Tensor object has a special "batch" dimension, so it needs to be of size ([outputdim, 1], batch_size) where batch_size corresponds to this batch dimension. Creating lookup inputs in this shape is supported through lookup_batch, as in minibatch.py. If you actually want to create non-lookup input with minibatches, this is supported in the c++ version but perhaps not in the python version yet? Right, @yoavg ?

@vered1986
Copy link
Contributor Author

Thanks!

I did originally call lookup_batch, but then fed these vectors into LSTMs, concatenated their outputs and used that as the input of another hidden layer, with some reshapes along the way. So it's either a lack of pycnn support, or wrong usage on my side.

Anyway, I'm currently using pickneglogsoftmax for each instance and summing them up, which works well, though I suppose slower.

@neubig
Copy link
Contributor

neubig commented Sep 9, 2016

If that's the case, this shouldn't cause an error. If you can create a minimal example that reproduces the error I can take a look and try to debug.

@vered1986
Copy link
Contributor Author

Here is an example code that produces an error:

from pycnn import *

m = Model()
lp = m.add_lookup_parameters((100, 10))

# Add parameters
W = parameter(m.add_parameters((5, 10)))
b = parameter(m.add_parameters((5)))
builder = LSTMBuilder(1, 10, 10, m)

# Batch lookup (for every timestamp of the LSTM)
batch = [(1, 2, 3, 4, 5), (5, 4, 3, 2, 1), (1, 1, 2, 2, 3)]
minibatch_size = 3
seq_length = 5
inputs = [lookup_batch(lp, [batch[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]

# Apply some transformations
lstm_outputs = builder.initial_state().transduce(inputs)[-1]
lstm_outputs = transpose(reshape(lstm_outputs, (10, minibatch_size))) # reshape to (minibatch_size, 10)
output = colwise_add(W * transpose(lstm_outputs), b)

print 'output.npvalue().shape =', output.npvalue().shape, '(batch_size=3, output dimension=5)'

# Using pickneglogsoftmax_batch (fails with "what():  Bad input dimensions in PickNegLogSoftmax: [{5,3}]")
labels = [1, 2, 3]
print pickneglogsoftmax_batch(output, labels).npvalue()

The output size seems to be wrong after the operations I applied, so pickneglogsoftmax_batch raises an error (and again, if I reshape output to (15, 1) I get the first error I mentioned).

Thanks!

@neubig neubig added major bug Issues that silently cause incorrect results, break installation on common environments, etc. and removed need more info labels Sep 9, 2016
@neubig
Copy link
Contributor

neubig commented Oct 11, 2016

OK, I've reproduced this and will try to fix it soon. Thanks for the easy-to-reproduce example.

@neubig neubig self-assigned this Oct 11, 2016
@neubig
Copy link
Contributor

neubig commented Oct 12, 2016

Hmm, this is because of the reshape here. dynet treats the number of minibatches as a special dimension, so the reshape needs to not be a normal (10 x minibatchsize) dimensioned matrix, but a (10x1) matrix, with the special "minibatch" dimension be equal to minibatchsize. I'm not exactly sure how to do this in the python version though...

@neubig neubig added minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds and removed major bug Issues that silently cause incorrect results, break installation on common environments, etc. labels Oct 12, 2016
@yoavg
Copy link
Contributor

yoavg commented Oct 12, 2016

How would you do it in the cpp version?

@neubig
Copy link
Contributor

neubig commented Oct 12, 2016

Dim({10}, minibatchsize)

@yoavg
Copy link
Contributor

yoavg commented Oct 12, 2016

is Dim({10}) and Dim({10}, 1) equivalent? (batch size 1?)

On Wed, Oct 12, 2016 at 6:25 PM, Graham Neubig notifications@github.com
wrote:

Dim({10}, minibatchsize)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#72 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAcVtGjWIahQzOjGIvle6p2DO2g4wTyOks5qzPvsgaJpZM4J37LN
.

@neubig
Copy link
Contributor

neubig commented Oct 12, 2016

Yes.

@yoavg
Copy link
Contributor

yoavg commented Oct 12, 2016

I pushed batch size support in reshape (and in Dim in general).
The code now works with the following change:

from dynet import *

m = Model()
lp = m.add_lookup_parameters((100, 10))

# Add parameters
W = parameter(m.add_parameters((5, 10)))
b = parameter(m.add_parameters((5)))
builder = LSTMBuilder(1, 10, 10, m)

# Batch lookup (for every timestamp of the LSTM)
batch = [(1, 2, 3, 4, 5), (5, 4, 3, 2, 1), (1, 1, 2, 2, 3)]
minibatch_size = 3
seq_length = 5
inputs = [lookup_batch(lp, [batch[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]

# Apply some transformations
lstm_outputs = builder.initial_state().transduce(inputs)[-1]
lstm_outputs = transpose(reshape(lstm_outputs, (10,), batch_size=minibatch_size)) # reshape to (minibatch_size, 10)
output = colwise_add(W * transpose(lstm_outputs), b)

print 'output.npvalue().shape =', output.npvalue().shape, '(batch_size=3, output dimension=5)'

# Using pickneglogsoftmax_batch (fails with "what():  Bad input dimensions in PickNegLogSoftmax: [{5,3}]")
labels = [1, 2, 3]
print pickneglogsoftmax_batch(output, labels).npvalue()

Still need to document it, though.

(also, may still be buggy when converting to numpy array, this may need a bit more work)

@vered1986
Copy link
Contributor Author

Thank you! The example code seems to work now.

@vered1986
Copy link
Contributor Author

vered1986 commented Oct 15, 2016

Sorry, I might have closed this bug too soon. Though there is no exception now, I can't seem to get the loss to decrease between iterations (which previously worked well with my workaround). I printed the value of lstm_outputs after the reshape, and it is a zero matrix. It happens also in the example code.

I'm not sure if it's a bug in the conversion to numpy array (when calling print lstm_outputs.npvalue()) or this is the actual values of the matrix. In the former case, it might be that I simply don't do the update as I should (I've made an "educated guess" based on several C examples). Here is an example code that reproduces this problem:

import numpy as np
from dynet import *

m = Model()
lp = m.add_lookup_parameters((100, 10))

# Add parameters
W = parameter(m.add_parameters((5, 10)))
builder = LSTMBuilder(1, 10, 10, m)

trainer = AdamTrainer(m)
num_epochs = 10
minibatch_size = 3
seq_length = 5
train_set = [(1, 1, 1, 1, 1), (2, 2, 2, 2, 2), (1, 1, 1, 1, 1)]
labels = [1, 2, 1]

for epoch in range(num_epochs):

    total_loss = 0.0

    # Use the entire small training set as a batch, just for the example
    batch_indices = np.random.permutation(len(train_set))

    # Batch lookup (for every timestamp of the LSTM)
    inputs = [lookup_batch(lp, [train_set[i][t] for i in range(minibatch_size)]) for t in range(seq_length)]

    # Apply some transformations
    lstm_outputs = builder.initial_state().transduce(inputs)[-1]
    lstm_outputs = reshape(lstm_outputs, (10,), batch_size=minibatch_size) # reshape to (minibatch_size, 10)
    output = W * lstm_outputs

    # Compute the loss for this batch
    curr_labels = [labels[i] for i in batch_indices]
    loss = pickneglogsoftmax_batch(output, curr_labels)

    sum_loss = sum_batches(loss)

    total_loss += sum_loss.value()
    sum_loss.backward()
    trainer.update()

    trainer.update_epoch()
    total_loss /= len(labels)
    print 'Epoch', (epoch + 1), '/', num_epochs, 'Loss =', total_loss

The output:

Epoch 1 / 10 Loss = 1.6094379425
Epoch 2 / 10 Loss = 1.6094379425
Epoch 3 / 10 Loss = 1.6094379425
Epoch 4 / 10 Loss = 1.6094379425
Epoch 5 / 10 Loss = 1.6094379425
Epoch 6 / 10 Loss = 1.6094379425
Epoch 7 / 10 Loss = 1.6094379425
Epoch 8 / 10 Loss = 1.6094379425
Epoch 9 / 10 Loss = 1.6094379425
Epoch 10 / 10 Loss = 1.6094379425

Thanks!

@vered1986 vered1986 reopened this Oct 15, 2016
@yoavg
Copy link
Contributor

yoavg commented Oct 16, 2016

this is lilkely related to #99
@neubig , can you take a look?

@neubig
Copy link
Contributor

neubig commented Oct 19, 2016

I've confirmed that this works now, due to the fix of #99

@neubig neubig closed this as completed Oct 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds
Projects
None yet
Development

No branches or pull requests

3 participants