Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cast floatX to avoid theano warning that cuda array can not cast fro... #606

Closed
wants to merge 11 commits into from

Conversation

udibr
Copy link
Contributor

@udibr udibr commented Apr 30, 2015

add cast floatX to avoid theano warning that cuda array can not cast from float64 or int64

@rizar
Copy link
Contributor

rizar commented May 1, 2015

How do I reproduce such a warning? I ran MNIST example on my GPU and everything was clean.

@udibr
Copy link
Contributor Author

udibr commented May 13, 2015

Add Xavier weight initialization and BinaryMisclassificationRate which is similar to MisclassificationRate but works with a binary (scalar) target

@rizar
Copy link
Contributor

rizar commented May 14, 2015

@udibr , as I said in my previous message, I can not reproduce the warning you report by running a MNIST demo. Can you shed some light on when it actually happens?

@rizar rizar mentioned this pull request May 15, 2015
class Xavier(NdarrayInitialization):
"""Initialize with Gaussian distribution with Xavier parameters.

Use the following gaussian parameters: mean=0 and var=scale/Nin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you forgot a square root here. Also we would need a reference to the paper here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add reference (and changed var to std so as to not confuse the reader)
however, the original paper uses a uniform distribution and I am using a gaussian....
so either I change my code or not call it Xavier or keep it as is (most places use gaussian)

@rizar
Copy link
Contributor

rizar commented May 15, 2015

Because this sort of urgent, I opened a new PR for this issue to be merged right away, #635. However, you can continue working on this one and we will be happy to merge it later, when it is ready.

@@ -58,7 +58,8 @@ def cost_matrix(self, y, y_hat):
class CategoricalCrossEntropy(Cost):
@application(outputs=["cost"])
def apply(self, y, y_hat):
cost = tensor.nnet.categorical_crossentropy(y_hat, y).mean()
cost = tensor.nnet.categorical_crossentropy(y_hat, y).mean(
acc_dtype=floatX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolved, for me, the same error messages I got in MisclassificationRate regarding float64 and CUDA. I guess this happens when y is int64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be fixed with Theano/Theano#2913.

I also checked that given the first float32 and the second int64 argument it returns float32, which is just what we need:

In [5]: tensor.nnet.categorical_crossentropy(numpy.array([[0.5, 0.5], [0.5, 0.5]], dtype='f32'), numpy.array([0, 1], dtype='i64')).eval()
Out[5]: array([ 0.69314718,  0.69314718], dtype=float32)

@rizar
Copy link
Contributor

rizar commented May 17, 2015

Xavier initialization and the fix in reverse_words look good to me, but please rebase the PR against the latest master (see our http://blocks.readthedocs.org/en/latest/development/pull_request.html for instructions)

@@ -223,3 +223,33 @@ def generate(self, rng, shape):
replace=False)
weights[i, random_indices] = values[i]
return weights


class Xavier(NdarrayInitialization):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Xavier" is the inventor's first name, so it's kind of weird to call it that. "Glorot" or "GlorotBengio" might be a better name...

def apply(self, y, y_hat):
# Here we have to cast both operands to floatX explicitly.
# Because int64 / float32 = float64 in Theano, unfortunately.
return (tensor.sum(tensor.neq(y, y_hat > 0.5)).astype(floatX) /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost a copy of MisclassificationRate, which is not nice.

You can add it as the second application method to MisclassificationRate, e.g. apply_binary, with the common code they have moved to a private method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought this way but then I noticed that in the same file you have both
BinaryCrossEntropy and CategoricalCrossEntropy so I thought it would be more appropriate to have a separate misclassification rates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I do not think there is much sense in having them as separate bricks. @bartvm , @dwf , what do you think?

@rizar
Copy link
Contributor

rizar commented May 23, 2015

It seems like your pull request contains your master branch.. Can you make a bit clearer what of this do you actually want to contribute to Blocks? And it would be better to have separate PRs for the initialization scheme, Adagrad, etc.

@udibr
Copy link
Contributor Author

udibr commented May 24, 2015

You are right. I should have a separate branch for each feature and not everything on master. I am closing this pull request and will try to fix this

@udibr udibr closed this May 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants