Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Incorrect implied shape inside loss function #8350

Closed
nickeleres opened this issue Oct 19, 2017 · 12 comments
Closed

Incorrect implied shape inside loss function #8350

nickeleres opened this issue Oct 19, 2017 · 12 comments

Comments

@nickeleres
Copy link

Ive seen this brought up in a couple other issues, but it hasnt been resolved as far as I know.

The data I am feeding into my loss function is of the following shape (batch size 32):

output.shape (32L, 51L)
label.shape (32L, 51L)

When the output and label are fed into the loss function, tho, I get the following error:

MXNetError: Shape inconsistent, Provided=(32,51), inferred shape=(32,1)

Why is the loss function implying an incorrect shape? In the line above said loss function, Gluon knows the correct shape of each input matrix, but the loss function auto-implies the shape incorrectly.

@nickeleres nickeleres changed the title Implied shape when calculating loss is incorrect Incorrect implied shape inside loss function Oct 19, 2017
@nickeleres
Copy link
Author

@pluskid closed a similar unresolved issue

#880

@nickeleres
Copy link
Author

When I reshaped my label to (32, 1), as the error message stated, I got the training to run, but the alignment between the output batch and the associated label makes no sense...There is now a single label for an entire batch.

@zhreshold
Copy link
Member

Please post your custom loss function

@nickeleres
Copy link
Author

nickeleres commented Oct 19, 2017

No custom loss function

loss = softmax_cross_entropy(output, label)

It looks like the SoftmaxCrossEntropyLoss demo works the same way...

data.shape (32L, 784L)
label.shape (32L,)
output.shape (32L, 10L)

My labels are one-hot 1 arrays...so it appears that the 0 entry of each label tensor (that's what shape (32,) is, right?) in the batch is used as the label for every training example in the batch, which doesnt make sense to me.

@nickeleres
Copy link
Author

nickeleres commented Oct 19, 2017

When I decrease the batch size to 1, the training labels are simply integers (which is actually the 0th entry in the one-hot array for each actual label), confirming my above assumption....

@zhreshold
Copy link
Member

You can use gluon.loss.SoftmaxCrossEntropyLoss, where you can specify from_logits=True to use one_hot labels.

@nickeleres
Copy link
Author

nickeleres commented Oct 19, 2017

This is my new loss function:

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss(from_logits=True)

And Im still getting:

MXNetError: Shape inconsistent, Provided=(32,51), inferred shape=(32,1)

Note: batch size is 32, and one-hot length is 51

@nickeleres
Copy link
Author

nickeleres commented Oct 19, 2017

I had to set sparse_label=False in my loss function, which now looks like:

gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)

@zhreshold
Copy link
Member

@nickeleres Just to mention that what I meant is sparse_label=False, from_logits is used if log_softmax is applied prior to the loss function.

@nickeleres
Copy link
Author

Ok. So what is the explicit correct loss function for one-hot labels?

@zhreshold
Copy link
Member

gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)

@nickeleres
Copy link
Author

Awesome, thank you so much

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants